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1 Introduction 


A graph G is chordal if each cycle of size greater than 3 in G has 
a chord, that is an edge between two non-adjacent vertices on the cycle. We 
present a simple parallel algorithm to test chordality of graphs which is based 
on the parallel Lexicographical Breadth-First Search algorithm. In total, 
the algorithm takes time 0{N) on Wtbreads machine and it performs work 
where N is the number of vertices in a graph. Our implementation 
of the algorithm uses a GPU environment Nvidia CUBA C. The algorithm 
is implemented in CUBA 4.2 and it has been tested on Nvidia GeForce GTX 
560 Ti of compute capability 2.1. At the end of the thesis we present the re¬ 
sults achieved by our implementation and compare them with the results 
achieved by the sequential algorithm. 

This thesis is organized as follows. Section 2 is an introduction to the par¬ 
allel programming using the GPU environment Nvidia GUBA G. Section 3 in¬ 
troduces the basic graph definitions used throughout the paper. Then it pro¬ 
vides an overview of the graph theory related to the LexBFS algorithm and 
chordal graphs. Section 4 introduces the LexBFS algorithm and its two most 
known implementations. Section 5 provides the sequential algorithm to test 
chordality of graphs and the analysis of its correctness and time complex¬ 
ity. In section 6 we present our parallel LexBFS algorithm and a parallel 
algorithm to test chordality of graphs. In section 7 we give the performance 
results of our parallel implementation compared to the sequential algorithm. 
In section 8 we discuss our results and the possible further work. 
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2 CUDA Programming 

CUBA (Compute Unified Device Architecture) is a general-purpose par¬ 
allel computing architecture for Nvidia GPUs. We present the main features 
of CUDA C used in our implementation. For more details, we recommend 
NVIDIA CUDA C Programming Guide [4] and CUDA C Best Practices 
Guide [5]. 

CUDA C extends the C/C++ programming model to the heterogeneous 
programming model which operates on the CPU called the host, and on 
the GPU called the device. In CUDA, a kernel is a function executed 
in parallel by many threads on the device. A thread is a sequence of execu¬ 
tions. The threads are grouped into blocks which are grouped into a grid. 
Each thread has a unique identifier in a grid. It can be computed within 
a kernel through a combination of the built-in variables: threadldx^ blockidx 
and grididx. 

All the threads may access data from the local, shared, constant, texture 
and global memory. To learn the texture memory and the constant memory 
see [4]. The local memory is a private memory of a thread. The shared 
memory is common to all threads within the same block and its lifetime is 
the same as the block. All theads have access to the same global memory. 

The simple model of a program using the CUDA architecture is as follows: 
allocate and initialize data on the host, allocate data on the device, transfer 
data from the host to the device, run the CUDA kernels on the device and 
transfer data from the device back to the host. 

The CUDA architecture allows to synchronize executions of the threads 
in one block by using the _syncthreads function. It works as a lock: 
the threads, which reach that point in the code, wait for other threads which 
have not done it yet. 

One of the methods to synchronize the threads between blocks, is to split 
the computitions in the synchronization points and to run each of that piece 
as a separate kernel. We use this method in our work. 

3 Background 

3.1 Basic graph definitions 

We introduce the following terminology to be used throughout this thesis. 
Let G = (U, E) be an undirected graph with the vertex set V and the edge 
set where E consists of unordered pairs of vertices in V. We denote 
the size of U by N and the size of E by M. If (ix, v) ^ E then we abbreviate 
it to uv. We use Nx to denote the neighborhood of a vertex x G U excluding 

X. 

Let N be the set of the natural numbers and let labelx be the label 
of X, where labelx is a string over the alphabet N. We use o to denote 
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the concatenation operator for labels. 

A bijection tt = { 1 , 2 ,..., N} ^ 1/ is called an ordering of G. Let 7t~^ 
denote the inverse of tt and thus is the index of v in the ordering 

of G. Let TT = vi^... ^vjsf be the ordering of G. We use LNy. to denote 
the neighborhood of Vi in the subgraph induced by 'T’i, ..., Vi-i. 

We say that an ordering tt of G is a BPS order if it is generated by 
the well-known BPS algorithm (see for example [7]). We present this algo¬ 
rithm in the next chapter. Note that a graph can have many different BPS 
orderings. 

A graph G is chordal if each cycle of size greater than 3 in G has a chords 
that is an edge between two non-adjacent vertices on the cycle. A vertex 
X is simplicial if Nx induces a clique. An order vi^V2^ • • • is a perfect 
elimination order if, for each i, is a simplicial vertex in the graph induced 
by vi,V 2 ,.. 

3.2 Overview 

The LexBPS algorithm, in addition to the recognition of chordal graphs, 
has many other applications. The LexBPS algorithm is used as a part 
of many graph algorithms such as recognizing interval graphs, or comput¬ 
ing transitive orientation of comparability graphs, co-comparability graphs 
and interval graphs. 

An orientation of an undirected graph G is a directed graph which is cre¬ 
ated by assigning a direction to each edge. An orientation of edges is acyclic 
if it does not contain a directed cycle. An orientation of edges is transitive 
for all X, y^z^iix^y is an edge and i/ ^ ^ is an edge then x ^ z is also 
an edge. A comparability graph is an undirected graph that has an acyclic 
transitive orientation on edges. A co-comparability graph is a graph G whose 
complement G is a comparability graph. An interval graph G is an undi¬ 
rected graph that is the intersection graph of intervals on the real line, 
i.e. G = (y^E), where V — li is an interval on the real 

line and (A, Ij) G A H /j ^ 0. 

Gilmore and Hoffman [ 1 ] proved that a graph is an interval graph if and only 
if it is a chordal graph and a co-comparability graph (Figure 1). 



Figure 1: From left: G 4 is a co-comparability graph and it is not an interval 
graph. The next graph is a chordal graph and it is not an interval graph. 
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Habib, McConnell, Paul and Viennot [2] gave a 0{N+MlogN) algorithm 
for the transitive orientation of a comparability graph and a 0{N + M) algo¬ 
rithm to recognize interval graphs. Both of them use the LexBFS algorithm. 
It can be proved [2] that if G is a co-comparability graph and tt = r’l, ^’25 • • •, 
is a LexBFS order of G then there exists a transitive orientation of G 
such that Vn is a sink/source of the orientation. Moreover if G is a com¬ 
parability graph and tt = vi^V2^ • • • is a LexBFS order of G then there 
exists a transitive orientation of G such that Vn is a sink/source of the ori¬ 
entation. 

4 Lexicographic Breadth-First Search 

The Lexicographic Breadth-First Search (LexBFS) algorithm was intro¬ 
duced by D. Rose, R. Tarjan and G. Lueker in 1976 for finding a perfect 
elimination order, if any exists. The LexBFS algorithm is a restriction 
of the widely used Breadth-First Search (BFS) algorithm in the following 
sense: each possible order of vertices produced by LexBFS is a BFS order. 
The difference between them is that the LexBFS algorithm additionally as¬ 
signs labels to nodes and then in each step of the algorithm chooses a node, 
whose label is lexicographically the largest. 

4.1 Characterization of BFS and LexBFS orderings 

We present and compare two characterizations of the vertex orderings 
that can by obtained by the BFS algorithm and the LexBFS algorithm. 

Let X be vertex of a graph G and let Nx denote its neighborhood. Let 
Q be a FIFO queue. We present an equivalent version of Tarjan’s BFS al¬ 
gorithm: 


Breadth-Frist Search algorithm 


BFSQ 

for X = 1 to n do 7 t ~^{ x ) = 0 

Q = 0 

for i = 1 to n do 

if Q.nonEmptyO then x = Q.dequeueO 
else X = any node s.t. 7r~^{x) = 0 

7T~^{x) = i, 7r(i) = X 

for each y G N{x) s.t. 7v~^{y) = 0 do 
if ^ ^ Q then Q.enqueue{y) 
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Property B. If a < b < ac ^ E and ab ^ E then exists d < a such that 
dbeE. 

Lemma 4.1. n is a BFS order tt satisfies the B-property. 

Proof. (^) Let tt be a BFS order and let Q be a FIFO queue used by the al¬ 
gorithm. We assume that the nodes of the graph G are renumbered according 
to TT. 



During the algorithm a was visited before b which was visited before 
c. When the algorithm visits a then it adds c to Q, because of ac G G, 
and it does not add 6 to Q, because of ab ^ G. Then b can be first in Q 
before c if and only if the algorithm had visited some d before it visited a 
and db ^ G then the algorithm had added b to Q before it added c to Q. 
In the same words, there exists d < a such that db G G. 



(<^) Let TTo be an order satisfying a property B. We want to show that tto 
is a BFS order. Let d be some vertex of graph G. When the BFS algorithm 
visits d it pushes on a queue all neighbors of d which have not been visited 
yet. Then BFS pops the next vertex from a queue. Let a be another vertex 
of a graph G. If the BFS algorithm visits d before a then all not visited 
neighbors of d are placed in tvq before all not visited neighbors of a. Hence it 
is sufficient to show that if d < a in order tto, then all neighbors of d which 
are placed on to the right of d in tto, lie before all neighbors of a (which are 
to the right of a in tvq). But it is equivalent to property B. See the figure 
above. 

□ 

Let X be a vertex of a graph G and let Nx be the neighborhood of x. 
Let labelx denote the label of x. Let pQ be a priority queue of vertices 
with priority on lexicographically the largest label. Consider the following 
algorithm: 
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Lexicographic Breadth-Frist Search algorithm 


LexBFSO 

for X = 1 to n do 

7T~^{x) = 0 
labelx = 0 
p(5.enqueue(x) 
for i = 1 to n do 

X = pQ.dequeueO j j labels is lexicographically the largest 

Ti~^{x) = i, 7r(i) = X 

for each y G Nx s.t. TT ^(^) = 0 do 
labelx = labelx o (n — i) 


Property LB. If a < b < e, ae ^ E and ah ^ E then exists d < a such that 
db ^ E and de ^ E. 

Lemma 4.2. n is a LexBFS order tt satisfies LB-property. 

Proof. (^) Let tt be the LexBFS order and let pQ be a priority queue 
used by the LexBFS algorithm. We assume that the nodes of the graph 
G are renumbered according to tt. 



a b c 


During the algorithm a was visited before b which was visited before e. 
Let 6 i, 62 , • • • be the label of b and ci, C 2 ,... be the label of e. Let i be 
the number of iteration and N be the number of vertices in G. 

When the algorithm visits a, it concatenates the label of c with N — i^ 
as ae G E^ and it does not concatenate the label of b with — i, as a 6 ^ E. 
Since the label of b is lexicographically larger than the label of c then there is 
an index jo in the labels such that Vj < jo : bj = ej and bj^ > ej^. The index 
jo is the first index at which the labels were updated in different iterations. 
The label of b was updated before the label of c because bj^ > ej^ and in each 
iteration the number N — i decreases as i increases. The jo index exists 
if and only if the algorithm visited some d such that the algorithm concate¬ 
nated the label of b with bj^ and it did not concatenate the label of c with bj^. 
It must be that d < a because otherwise we would have: 

1. if ad G then a < d < c < 6 , as the numbers N — i decreased 
and ae ^ E^ab ^ E. 
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2. if ad ^ E then a < c < d < b, diS ac ^ E and ab ^ E. 


Both cases are in contradiction to a < 6 < c. Therefore, there exists d < a 
such that db ^ E and dc ^ E. 



(<^) Let TTo be an order satisfying a property LB. We want to show 
that TTo is a LexBFS order. Let d be a vertex of graph G. When the LexBFS 
algorithm visits d then it updates the labels of all not visited neighbors 
of d. Let a be another vertex of G. If the LexBFS algorithm visits d before 
a then all neighbors of d which are not adjacent to a have labels greater 
than the labels of all neighbors of a, because the numbers added to the end 
of labels decrease for successive vertices. So in order to prove the claim 
we must show that if d < a in tto then all neighobrs of d, which are to the right 
of d in TTo and they are not adjacent to a, he before all neighbors of a, 
which are to the right of a in ttq. But again, it is equivalent to property LB. 
See the figure above. 

□ 

It is easy to see that the LB-property implies B-property so the LexBFS 
algorithm is a restriction of the BFS algorithm. 

4.2 Two implementations 

The first implementation of the LexBFS algorithm was proposed by D. J. 
Rose, R.E. Tarjan, G. S. Leuker in 1976 [3]. They use a double-linked list 
Lk to store vertices of the same label k. All lists are stored in the list 
L in descending order given by labels k. Additionally, each vertex x has 
a pair of pointers the first of which is leading to the list Lk containing x 
and the second one is leading to the place of x on the list Lj^. 

At the beginning of the algorithm, all the vertices have the same label 
0 and they are on the list L^. There are two operations: getting a vertex 
X with the lexicographically largest label and updating labels of all nodes 
adjacent to x. To perform the first operation the algorithm takes the first 
list Lj^ from L and then returns the first vertex from L]^. In the second 
operation, for each y adjacent to x, the algorithm concatenates the label k 
of the vertex y with the number (N — i). Then the algorithm removes y 
from the list Lj^ and inserts it to the list where i is the iteration 

number. If the list L^o(Ar-i) does not existed in L then the algorithm creates 
it. This implementation has the 0{N + M) time complexity. 
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The second implemenation was proposed by Habib, McConnell, Paul 
and Viennot in 2000 [2] and it uses the partition refinement technique. Let V 
be a doubly-linked list consisting of all vertices of G. Let L be a doubly- 
linked list of classes of vertices. All vertices in a class occupy consecutive 
elements in V and the class is represented by a pair of pointers to the first 
and the last element in the class. Each vertex x has a pointer to the class 
containing x. 


The LexBFS algorithm using partition refinement 


LexBFSO 

L - a single-element list of a class containing all vertices 
for i = 1 to n do 

X - the first element of the first class on the list L 
remove x from L 
7r~^(x) = i, 7r(i) = x 
/ j‘partition 

for each class C G L do 

= c n 

C2^C\C, 

replace C by C 2 in L 


During the partition, each y G Nx is removed from an old class C and it is 
inserted to some new class Cx- The partition procedure can be implemented 
in 0(|A^a;|) time. 

5 Chordal graphs 

Before we present the algorithm to test chordality of graphs we prove 
a theorem introduced by D.J. Rose, R.E. Tarjan, G. S. Leuker [3]. 

Theory 5.1 (Rose, Tarjan, Leuker). A graph G is chordal if and only if a 
LexBFS order of G is a perfect elimination order. 

Proof. (^) Let G be a chordal graph and let tt = ... ^Vn he its LexBFS 

order. We assume that the nodes of G are renumbered according to tt. 
We show that each vertex vi is simplicial in the graph induced by 'L’i, ..., vi-i. 

Assume by contradiction that some Vi is not simplicial. Then there exist 
a, 6 G TT such that a <b < vi both adjacent to vi and not adjacent themselves. 
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Because tt satisfies LB-property then there exists some c G tt such that c < a 
and c is adjacent to b and it is not adjacent to Vi. Note that ca ^ E because 
G is chordal and otherwise we would have a chordless cycle (c, a, Vi^b). 



c a b Vi 


Now we have got c < a < b < Vi^ ca ^ E and cb G E. Again, we use 
the LB-property in respect to c, a, b and obtain d G tt: d < da ^ E^ 
db ^ E. Moreover d is not adjacent to c because of the cycle (d^c^b^Vi^a) 
and chordality of G. 



Next time we can apply the LB-property to the vertices d, c, a. This step 
can be repeated infinitely, thus contradicting the general assuption that G 
is finite. 

So we have proved that each vertex vi is simplicial in the graph induced 
by ui,..., u^_i, that is the order ui,..., is the perfect elimination order. 

(<^) It suffices to prove that if graph G has any perfect elimination order 
then it is chordal. 

Let TT = ui,..., be a perfect elimination order of G. Assume that in G 
there is a cycle G of length > 4 and let G C be the vertex of the greatest 
index in tt. Let a and b be vertices adjacent to Vi in the cycle G. Because 
a and b are on the left from Vi in tt then they are adjacent as tt is a perfect 
elimination order. Therefore G has the chord ab^ which finishes the proof. 

□ 


5.1 Maximum Cardinality Search 

In 1984 Robert E. Tarjan and M. Yannakakis introduced in [6] the Maxi¬ 
mum Cardinality Search algorithm (MCS) as an alternative method for find¬ 
ing a perfect elimination ordering of chordal graphs. The Maximum Cardi¬ 
nality Search instead of strings uses natural numbers as labels for vertices. 
In each iteration, the algorithm chooses a new vertex of the largest label, 
that is the vertex whose neighborhood in the graph induced by the nodes 
chosen so far is the largest among all vertices have not chosen yet. The MCS 
algorithm has a 0{N -h M) time implementation [6]. 
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Let G be a graph, pQ be a priority queue. For each x (G) lahelx G N. 
We present Tarjan and Yannakakis’s MCS algorithm: 


Maximum Cardinality Search algorithm 


MCSQ 

for X = 1 to n do 
Tl~^{x) = 0 
lahelx — 0 
p(5.enqueue(x) 
for i = 1 to n do 
X = pQ. dequeue 0 
7r“^(x) = i, 7r(i) = x 
for each y ^ Nx such that 7T~^{y) = 0 do 
lahelx = lahelx + 1 


Robert E. Tarjan and M. Yannakakis proved the following theorem [6]. 

Theory 5.2. G is a chordal graph if and only if MGS-order ofG is a perfect 
elimination order. 

5.2 Algorithm to test chordality 

Theorem 5.1. gives us the following tool to test if a given graph is 
chordal. First we run the LexBFS algorithm to produce a LexBFS order. 
Next we check if the LexBFS order is the perfect elimination order. 

Let TT = ui,...,Ur 2 , be an order returned by the LexBFS algorithm. 
For each vi let LNy. C Ny. be vertices adjacent to vi on the left from vi 
in TT and let py. G LNy. be the right most vertex in LNy .. 

The algorithm presented below tests if tt is a perfect elimination order. 
This is performed by checking if for each Vi it holds that LNy. — {py. } C LNp^^ 
The correctness of such the approach is proved later. 

To test if TT is a perfect elimiantion order we only need to check if for each Vi 
is LNy. — {py. } C LNp^ . 
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Test if a LexBFS order is a perfect elimination order 


chordalityTest() 

for X = 1 to n do Pa; = 0 
for each y G Nx that 7v~^{y) < 7v~^{x) do 
LNx.didd{y) 

if 7T~^{y) > '^~^{Px) then Px ^ y 
for X = 1 to n do 
for each y G Nx do 
visitedy — 1 
for each y G Nx do 
if Py = X then 

// check if LNy — {x} C LNx 
for each z G LNy such that z ^ x do 
if visitedz ^ 1 then 
return false 
for each p G A^a; do 
visitedy = 0 
return true 


5.3 Correctness and complexity of algorithm 

We prove that the algorithm for testing if a LexBFS order is a perfect 
elimination order is correct. 

Let TT be a perfect elimination order. We show that then the algorithm 
returns true. Let v be some node of G. There are two cases: 

1. py = 0. Then LNy is empty and the algorithm does not return false. 

2. Py = u, for some u. Then algorithm checks if LNy — {i/} C LNy. 



Because all nodes of set LNy — {i/} are on the left of u then they are 
candidates to members of LNu. It remains to show that they are 
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adjacent to u. Because u = py then u G LNy and LNy is a clique 
(because tt is a perfect elimination order) therefore all vertices of LNy — 
{u} are adjacent to u and LNy — {u} C LNy. So the algorithm never 
return false, hence it returns true in the end. 

Let TT be not a perfect elimination order. We show that then the algo¬ 
rithm returns false. We assume that vertices of G are renumbered accord¬ 
ing to TT. As TT is not a perfect elimination order, there exists some node 
in TT that its left neighborhood does not induce a clique. Let v be the first 
such node in tt and let py — u for some u. Because u < v m then LNy 
is a clique (as v is the first vertex in tt for which LNy is not a clique). There¬ 
fore LNy — {i/} (f. LNy and the algorithm returns false. 

The time complexity of the algorithm is determined by the nested loops. 
Let V be some node of G. Note that the size of LNy is 0{\Ny\). Let us see 
how many times the algorithm scans Ny. 

1. marking visited array 

2. looking for py 

3. unmarking visited array 

4. for u such that Py — v and for each v there is at most one such vertex u 

It means that each list Ny is read at most four times which gives 0{M) 
time for the whole graph. Summing up with the time of producing the LexBFS 
order, the test of chordality takes 0{N + M) time. 

6 Parallel algorithm 

Testing chordality of graphs has two steps: finding a LexBFS order 
and checking if the LexBFS order is a perfect elimination order. To paral¬ 
lelize the chordality test we need to parallelize each of these steps separately. 

6.1 Parallel LexBFS 

In our approach to the parallel version of LexBFS, the main loop of algo¬ 
rithm runs on the CPU and during each iteration i two task are performed 
on the GPU. The first one is choosing the vertex v with the lexicographically 
largest label and the second one is concatenating labels of vertices adjacent 
to V with N — i. Both jobs are performed by N threads assigned to N vertices 
in a graph. 

We use the following data structures. The graph G is stored in an adja¬ 
cency matrix Adj. A linked list L is a list of sets Lj^. Each set Lj^ includes all 
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CPU 

for each i ^ 1 ... N do: 


GPU (i) 

X-E- vertex with the lexigraphically^ 
largest label 
mark(x) 

//updating labels 
for each y G N(x), y not marked, 
do in parallel: 
label(y) f- label(y) o (N-i) 



1 


Figure 2: LexBFS algorithm 


vertices whose labels are equal k. We identify the label of a set with the label 
of nodes in that set. These sets form a partition of the vertex set, defined 
by means of labels. L is sorted lexicographically ascending. As L is a linked 
list, each set of L has a pointer next leading to the next set in the order, 
next — NULL in the last set in L. The order array stores the order of nodes 
computed by the algorithm. We say that a node is aetive if it has not pro¬ 
cessed yet. (see Figure 3.) 


next 



Figure 3: List L including sets: Lj^i^ Lk 2 ^ Lj^s^ L^a^ L^^. The last set 
on the list is Lj^^. 
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At the begining of algorithm all nodes of G are active and they have 
the same label 0. It means that the list L has only one set consisting 
of all nodes of G and next leads to NULL. 

In a sequential version of algorithm, to find the vertex with the lexico¬ 
graphically largest label, the algorithm returns any vertex belonging to the last 
set on L. As the last set on L is characterized by the pointer next equal 
NULL^ then this procedure can be performed in parallel by N threads as fol¬ 
lows. 

Let X be the vertex assigned to the thread thx- Let Lx be the set includ¬ 
ing X. Let current be a global variable shared by all threads. For each thread 
thx in parallel do: 

if Lx.next — NULL then current ^ x 

After this procedure the current variable stores a vertex whose label 
is lexicographically the largest. Note that if there is more than one such ver¬ 
tex then we cannot predict which one will be stored in current. 

Let i be the iteration of the main loop in which current has the lexico¬ 
graphically largest label. Let y be some neighbor of current^ ly be the label 
of y and let y be in the set Ly. 

The update operation concatenates label ly in back with number N — i. 
Note that the number N — i has not appeared in any label so far and it is 
the smallest among all numbers occuring in the labels. 

Next, the algorithm removes y from Ly and inserts it to the new set 
containing the nodes with the label ly o (iV — i). If the new set has not 
existed yet then the algorithm creates it. 

Let us look closely at the operation of creating a new set. Let A and B 
be two sets of nodes on the list L containing nodes labeled I a and Ib respec¬ 
tively. Assume that Ia < Ib^ he- A comes before B in L. 

Lemma 6.1. If j is a number of iteration during which the new set contain¬ 
ing nodes with the label I a o {N — j) is created then I a < {I a ^ ~ j)) < 

in L. 

Proof. Each label is a string of numbers. Let Ia = {ai, a 2 , as,..., a|/^|}, 
Ia o {N - j) = {ai,a 2 ,a 3 ,...,a\i^\,N - j} and Ib = {bi,b2,h,... 

Let k be the smallest index such that \/i < k : ai — bi and Ok ^ bk- 
There are two cases: 

1. A; < \Ia\ and k < \Ib\ 

Then < bk and this two numbers determine I a o {N — j) < Ib- 

2. |/a| < \Ib\ and k = \Ia\ + 1 

Then I a is a prefix of Ib and concatenation also gives I a o {N — j) < Ib 
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Figure 4: The list L before and after moving x, z to the new set. Vertices 
x^y^z are adjacent to current vertex. 


because {N — j) is less than all numbers in Ib so in particular smaller 
than bk- 

Note that always Ib < Ib ^ — j) because of j/^l < |/^ o (V — j)|. □ 

The lemma gives us the following observation: 

Observation 6.2. When a new set is ereated for vertiees from a given set 
S then it should be inserted between S and its sueeessor on the list L. 

Based on observation 6.1., for each new set we can determine its place 
in the list without any additional list traversal or label comparisons. 

How many new sets are created during one iteration? The answers is: 
at most one for each old one. Indeed if y and z are neighbors of current 
vertex and belonging to some set S then their labels are equal both before 
and after concatenating them with N — i. 

Since in our algorithm updating labels is performed in parallel, it could 
happen that for some new label several threads would simultaneously cre¬ 
ate several new sets and then insert them to the list. In order to avoid 
such a mistake we use synchronization between performed instructions. 
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Let i be the number of iteration and let current be a vertex with lexi¬ 
cographically the largest label chosen during i iteration. Let x be the ver¬ 
tex assigned to the thread thx- Let x belong to set Lx. Let oldNextx 
and newNext X be private variables of the thread thx. For all threads thx 
in parallel do: 

1. if X is not active or x is not adjacent to current then stop 

2. oldNextx ^ Lx.next 

3. create a new set newNextx 

4. synchronization: wait for other threads 

5. set pointers: 

Lx.next = new N ext X 
new Nextx.next = oldNextx 

6. synchronization: wait for other threads 

7. insert x to Lx.next 

Each thread creates its new set newNextx and inserts it to the order. 
After that for each node x, newNextx.next = oldNextx but only for one 
of them we will have Lx.next = newNextx. See Figure 5. 



Figure 5: Insert a new set newNextx into the list L 


Note that we cannot predict which newNextx will be in Lx.next so syn¬ 
chronization is performed to all threads read the same Lx.next inserted af¬ 
ter Lx in the list. The new sets of other threads are forgotten. 

Now let us look at removing vertices from the sets. After this operation 
some sets can be empty. To get a vertex of the lexicographically largest 
label in the list, we take the last set on the list and this set cannot be empty. 
Therefore after each update operation, when removing vertices from sets 
is performed, we remove all empty sets from the list. 
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Before we show the procedure of removing the empty sets, consider 
the following lemma. 

Lemma 6.3. If, after the update operation, the set is empty then its 
sueeessor on the list, if exists, is nonempty. 



Proof Let Lj^ be not empty set before the update operation. If after the up¬ 
date operation is empty then from Observation 6.2. we know that the set 
Lj^.next includes all vertices which were in set L]^ before the update. □ 

The observation that for each empty set, its predecessor and successor 
are nonempty means that all empty sets can be removed in parallel in one 
time. The removal of one set requires changing only two links of two adjacent 
non-empty sets. It is correct because it does not require any additional 
traversing through the list and it is independent from removing other empty 
sets. 

To find out which sets are empty, we use an additional counter array. 
For each set in the list, counter stores 1 if the set includes at least one node 
or 0 otherwise. At the beginning all slots of counter are 0. 

Let X be the vertex assigned to the thread thx- Let x be in set Lx- Let i 
be the number of iteration. Let current be the vertex with lexicographically 
the largest label during i iteration. Let oldNextx and newNextx be private 
variables of the thread thx. For all threads thx in parallel do: 

1. if X is not active then stop 

2. counter [Lx] ^ 1 

3. oldNextx ^ Lx .next 

4. synchronizetion: wait for other threads 

5. if counter[oldNextx] — 1 then stop 

6. set Lx.next ^ oldNextx.next 

We use synchronization between instructions to make sure that counting 
vertices and removing empty sets are correct. Otherwise some sets could 
be removed despite they are not empty and pointers could be changed im¬ 
properly. 
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In our implementation, getting the vertex of lexicographically the largest 
label and updating the labels of all adjacent vertices are performed in four 
stages. Each stage run on the GPU and they are synchronized - the new one 
does not start until the last one has not finished. Because we use the CUBA 
language to implement the algorithm we use the term kernel instead of stage. 
Each kernel is executed by N threads, one for each vertex. 

At the beginning of algorithm all vertices have lexicographically the same 
label then we set current to random vertex 1. Next, in each iteration of for 
loop, one vertex is choosen to the LexBFS order. 

The first kernel adds current vertex to the LexBFS order and marks 
it as non-active. Next, the first part of the labels updating is performed: 
setting counters of sets on 0 and saving the next pointers of each set. 

In the second kernel, the new sets are inserted to the list L. 

In the third kernel, each vertex that is active and adjacent to current 
is moved to the next set. Next, the counting is performed: for each active 
vertex, the counter of its set is made equal 1. At the end, the first part 
of deleting the empty set is performed: saving the next pointers of each set. 

The last kernel deletes all empty sets and chooses the new current vertex. 

At the end of the LexBFS algorithm, the order array stores the LexBFS 
order. 
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Parallel Lexicographic Breadth-Frist Search algorithm 


parallelLexBFS() 
current ^ 1 
for time ^ 1 to n do 
kernell() 
kernel2() 
kernels 0 
kernel4() 

kernell() 

if X is active then 

oldNextx ^ Lx .next 
counter [Lx] ^ 0 
create newNextx 
if X is current then 
order[time] ^ x 
mark x as non-active 

kernel2() 

if X is active and x is adjacent to current then 
//inserting new sets to list 
Lx .next ^ newNextx 
newNextx.next ^ oldNextx 

kernels 0 

if X is active and x is adjacent to current then 
//moving to new sets 
move X to Lx.next 
if X is active then 
counter[Lx\ ^ 1 
oldNextx ^ Lx.next 

kernel4() 

if X is active then 

if counter[Lx] = 0 then 
//deleting empty sets 
Lx.next ^ oldNextx.next 
if Lx.next — NULL then 
//updating eurrent 
current ^ x 
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6.2 Parallel test for perfect elimination order 

Now we are given the LexBFS order tt. The second step of testing 
chordality of graphs is checking if the LexBFS order is the perfect elimi¬ 
nation order, that is, if for each vertex x, the neighborhood of x on the left 
in the order forms a clique. 

Let LNx C Nx be the set of all nodes adjacent to x that he on the left 
of X in TT and let px be the right most node in LNx- In the sequential 
version of the algorithm, for each node x we check if LNx — {Px) C LNp^. 
Now we do this in parallel for all nodes. 

The algorithm has two kernels. The first one, for each x, in parallel 
computes the left neighborhood LNx and the right most vertex in LNx- 
In the second kernel, each thread thx processes the left neighborhood of x 
and the left neighborhood of Px- If some left neighbor of x, different from px^ 
is not a left neighbor of Px then thx marks the global variable flag on false. 
At the end, if flag is true then the order is the perfect elimiantion order. 


Parallel Test for Perfect Elimination Order 


//run on the cpu 
parallelTestPEO 0 
flag ^ true 
preparationLN andP () 
testingO 

if flag — true then return YES 
else return NO 

//run on the gpu 
preperationLN andP () 

Px ^ 0 

for each y adjacent to x do 

if order~^{y) < order~^{x) then 
insert (y) 

if order~^{y) > order~^{px) then 
Px^y 

//run on the gpu 
testingO 

for each y adjacent to x do 

if ^ G LNx and y ^ LNp^ then 
flag ^ false 
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6.3 Details of the parallel implementation 

After reading the input, the algorithm copies the adjacency matrix 
of the graph to the array on the device memory. During the LexBFS algo¬ 
rithm, no other memory transfer between host and device is performed. 

Since the algorithm does not compare any labels, the label concatena¬ 
tion can be omitted. Instead of this, the algorithm assignes to the new 
sets the numbers which have not appeared yet. In each iteration, there are 
at most N new sets, hence during the whole algorithm there are at most 
new sets. Since each set has a pointer to the next set on a list then to store 
all pointers, the algorithm uses an array of size A/^^, which is indexed 
by the numbers of the sets. 

Our parallel implementation of the LexBFS algorithm uses the following 
arrays: 

1. the Adj array of size N‘^ for the boolean adjacency matrix. 

2. the label array of size N for the integer labels of sets. 

3. the order array of size N for the integer indices of vertices 
in the LexBFS order. 

4. the next_label array of size for the integer indices of the next sets. 

5. the old_next_label auxiliary array of size N for the saved values 
from the next_label array, one value for each vertex. 

6. the eounter array of size for the boolean flags for recognizing if a set 
is empty. 

7. the eurrent variable for the integer number of vertex whose label is 
lexicographically the largest. 

During an iteration, the algorithm processes one eurrent vertex which is 
assigned to one row of every 2-dimensional array and all threads process 
that row in one time. See flgure below. 
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Because each current is unique and unrepeatable throughout the LexBFS 
algorithm then each row is visited only once. Therefore, in order to re¬ 
duce the amount of the device memory used by our algorithm, we use 
the 2-dimensional Adj array for two purposes: first as the adjacency ma¬ 
trix, next as the counter array. 

The second part of the chordality algorithm uses two arrays: the Adj ar¬ 
ray and the order array. Because the Adj array is overwitten after the LexBFS 
algorithm then the algorithm copies again the adjacency matrix from the host 
memory to the Adj array on the device memory. 

7 Tests and results 

We introduce the following terminology to be used in this section. A graph 
G = (y, E) with the vertex set V of size N is sparse if the size of E is 0{N). 
A graph G is dense if the size of E is We consider the following 

classes of graphs: 

1. Cliques on Wvertices, for N G {1000, 2000, 3000,..., 10000}. 

2. Dense random graphs on 10000 vertices. 

3. Sparse random graphs on 10000 vertices. 

4. Trees on 10000 vertices. 

5. Chordal random graphs on 10000 vertices. 

We test two implementations. The sequential implementation is the Habib, 
McConnell, Paul and Viennot algorithm presented in [2], which use a static 
memory allocation. For each class, we also present the time excluding the in¬ 
put reading and the dynamic allocation of the device memory. For the paral¬ 
lel implementation, the time of reading the input and the dynamic allocation 
on the GPU is many times greater than the remaining time of the algorithm. 
However algorithm cannot be implemented without these operations. 
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7.1 Cliques 

Figure 6 presents timing results for cliques. For graphs of size smaller 
than 1000, the sequential version is faster. When vertices number is 10000, 
the parallel implementation is two times faster than the sequential one. 


Figure 6: Cliques 
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7.2 Dense graphs 

Figure 7 presents timing results for dense random graphs. For each test 
the parallel implementation is almost two times faster than the sequential 
implementation. 


Figure 7: Dense random graphs: N = 10000, M = 0{N‘^) 
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7.3 Sparse graphs 

We have tested our implementation on sparse random graphs which 
M — 20N. The parallel implementation is slower than the sequential imple¬ 
mentation. (Figure 8) 


Figure 8: Sparse random graphs: N = 10000, M — 20N 
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7.4 Trees 

The results for trees are very similar to the results for sparse random 
graphs (Figure 9). 


Figure 9: Trees: N=10000 
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7.5 Chordal graphs 

Figure 10 presents timing results for chordal random graphs, including 
dense and sparse graphs. Only for sparse graphs the parallel implementation 
is slower. On this figure it is easy to see that the parallel implementation 
is stable - the time of algorithm is independent from the number of edges, 
in contrast to the sequential implementation. 


Figure 10: Chordal random graphs, N=10000 
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8 Conclusion and Future Work 


The main result of this paper is the parallel algorithm to test chordality 
of graphs based on our own efficient parallel version the LexBFS algorithm. 
For a graph G of vertices and M edges, the algorithm takes the 0{N) time 
and performs the 0{N‘^) work on the A^-threads machine. We use the CUBA 
multithreads architecture to implement these algorithms. 

Our parallel implementation achives best results for cliques and dense 
graphs. For graphs of 1000 and more vertices, the parallel algorithm is 
significantly faster than our fast sequential implementation and for graphs 
of 10000 vertices, the parallel implementation is two times faster than the se¬ 
quential version. For trees, sparse graphs and small graphs (less than 1000 
vertices) the sequential algorithm outperforms the parallel one. However, 
for this kind of data the parallel implementation is stable, the execution 
time is independent of the size of a graph. 

It would be interesting if the parallel LexBFS algorithm could be used 
as a core for efficient parallel testing of interval graphs. Further research 
could be also made towards parallel implementation of the MCS algorithm. 
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